biotoolscmgfandomcom-20200214-history
R
What is R? R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formula where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS. R commands Start R and change the working directory to where the statistics files are found. R setwd("/home/student/Genomes/") getwd() 1 "/home/student/Genomes" Reshape table to matrix (heatmap) This example illustrate a formatting situation that you might run into in working with multiple values per genome. State, Year, Value KY, 1998, 56 KY, 1997, 78 IL, 1998, 48 IL, 1997, 72 and I want: State, 1997_value, 1998_value KY, 78, 56 IL, 72, 48 You want to use the reshape() function. reshape(data, idvar="State", timevar="Year", direction="wide") Reference the last column of data-frame codon,length(codon) Heatmaps Codon usage heatmap To create a heatmap pf the codon usage follow this pipeline. Make sure that your data structures look as the examples bellow. install.packages("gplots") library(gplots) codon <- read.table("codonUsage.all") colnames(codon) <- c( 'Name', 'codon', 'score', 'count') codon <- codon1:3 test <- reshape(codon, idvar="Name", timevar="codon", direction="wide") codonMatrix <- data.matrix(test2:length(test)) rownames(codonMatrix) <- test$Name codon_heatmap <- heatmap.2(codonMatrix, scale="column", main="Codon usage", xlab="Codon fraction", ylab="Organism", trace="none", margins=c(8, 20)) dev.print(pdf, "codonUsage.pdf") dev.off() The formats of each data structure is shown bellow: > codon Name codon score 1 Acidaminococcus_fermentans_DSM_20731 AAA 3.05528 2 Acidaminococcus_fermentans_DSM_20731 CAA 0.30650 ........ > test Name score.AAA score.CAA score.GAA score.TAA 1 Acidaminococcus_fermentans_DSM_20731 3.05528 0.30650 5.23985 0.15237 65 Acidaminococcus_intestini_RyC-MR95 3.02789 0.91191 4.91588 0.16988 ........ > codonMatrix score.AAA score.CAA score.GAA score.TAA score.ACA Acidaminococcus_fermentans_DSM_20731 3.05528 0.30650 5.23985 0.15237 0.34499 Acidaminococcus_intestini_RyC-MR95 3.02789 0.91191 4.91588 0.16988 0.98450 Amino acid heatmap To create a heatmap pf the codon usage follow this pipeline. Make sure that your data structures look as the examples bellow. library(gplots) aa <- read.table("aaUsage.all") colnames(aa) <- c( 'Name', 'aa', 'score') test <- reshape(aa, idvar="Name", timevar="aa", direction="wide") aaMatrix <- data.matrix(test2:length(test)) rownames(aaMatrix) <- test$Name stat_heatmap <- heatmap.2(aaMatrix, scale="column", main="Amino acid usage", xlab="Amino acid fraction", ylab="Organism", trace="none", margins=c(8, 20), col = cm.colors(256)) dev.print(pdf, "aaUsage.pdf") dev.off() The formats of each data structure is shown bellow: > aa V1 V2 V3 1 Acidaminococcus_fermentans_DSM_20731 G 8.1275 2 Acidaminococcus_fermentans_DSM_20731 A 9.0013 ........ > str(aa) 'data.frame': 620 obs. of 3 variables: $ Name : Factor w/ 31 levels "Acidaminococcus_fermentans_DSM_20731",..: 1 ... $ aa : Factor w/ 20 levels "A","C","D","E",..: 6 1 18 10 8 5 20 19 7 9 ... $ score: num 8.13 9 7.3 10.12 5.8 ... > test Name score.G score.A score.V score.L score.I score.F 1 Acidaminococcus_fermentans_DSM_20731 8.1275 9.0013 7.2975 10.1203 5.7992 3.8577 21 Acidaminococcus_intestini_RyC-MR95 7.7623 8.7881 7.0802 9.7019 6.3094 4.0698 ........ > aaMatrix score.G score.A score.V score.L score.I score.F Acidaminococcus_fermentans_DSM_20731 8.1275 9.0013 7.2975 10.1203 5.7992 3.8577 Acidaminococcus_intestini_RyC-MR95 7.7623 8.7881 7.0802 9.7019 6.3094 4.0698